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Abstract 

Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained 
optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual 
level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training 
phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample 
extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the 
structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing 
distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large 
scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that 
real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the- 
art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a 
major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels 
of hierarchy using internal cluster quality metrics on 7 real-life networks. 
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Introduction 

Large scale complex networks are ubiquitous in the modern era. 
Their presence spans a wide range of domains including social 
networks, trust networks, biological networks, collaboration 
networks, financial networks etc. A complex network can be 
represented as a graph G = ( V ,E) where V represent the vertices 
or nodes and E represents the edges or interaction between these 
nodes in this network. Many real-life complex networks are scale- 
free [1], follow the power law [2] and exhibit community like 
structure. By community like structure one means that nodes 
within one community are densely connected to each other and 
sparsely connected to nodes outside that community. The large 
scale network consists of several such communities. This problem 
of community detection in graphs has received wide attention 
from several perspectives [3-14]. 

The community structure exhibited by the real world complex 
networks often have an inherent hierarchical organization. This 
suggests that there should be multiple levels of hierarchy in these 
real-life networks with good quality clusters at each level. In other 
words, there exist meaningful communities at refined as well as 



coarser levels of granularity in this multilevel hierarchical system of 
the real-life networks. 

A state-of-the-art hierarchical community detection technique 
for large scale networks is the Louvain method [15]. It uses a 
popular quality function namely modularity (Q) [3,5,6,16] for 
locating modular structures in the network in a hierarchical 
fashion. Modularity measures the difference between a given 
partition of a network and the expectation of the same partition for 
a random network. By optimizing modularity, they obtain the 
modular structures in the network. However, it suffers from a 
drawback namely the resolution limit problem [17-19]. The issue 
of resolution limit arises because the optimization of modularity 
beyond a certain resolution is unable to identify modules even as 
distinct as cliques which are completely disconnected from the rest 
of the network. This is because modularity fixes a global resolution 
to identify modules which works for some networks but not others. 

Recently the authors of [20] show that methods trying to use 
variants of modularity to overcome the resolution limit problem, 
still suffer from the resolution limit. They propose an alternative 
algorithm namely OSLOM [21] to avoid the issue of resolution. 
However, in our experiments we observe that OSLOM works well 
for benchmark synthetic networks [4] but in case of real-life 
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Figure 1. Steps undertaken by the MH-KSC algorithm. 

doi:10.1371/journal.pone.0099966.g001 



networks it is unable to detect quality coarse clusters. We also 
evaluate another state-of-the-art hierarchical community detection 
technique called the Infomap method [7]. The Infomap method 
uses an information theoretic approach to hierarchical community 
detection. It uses the probability flow of random walks as a 
substitute for information flow in real-life networks. It then 
fragments the network into modules by compressing a description 
of the probability flow. 

Spectral clustering methods [10-14] belong to the family of 
unsupervised learning algorithms where clustering information is 
obtained by the eigen-decomposition of the Laplacian matrix 
derived from the affinity matrix (5) for the given data. A drawback 
of these methods is the construction of the large affinity matrix for 
the entire data which limits the feasibility of the approach to small 
sized data. To overcome this problem, a kernel spectral clustering 
(KSC) formulation based on weighted kernel principal component 
analysis (kPCA) in a primal-dual framework was proposed in [22]. 
The weighted kPCA problem is formulated in the primal in the 
context of least squares support vector machines [23] which results 
in eigen-decomposition of a centered Laplacian matrix in the dual. 
As a result, a clustering model is obtained in the dual. This model 
is build on a subset of the original data and has a powerful out-of- 
sample extension property. This property allows cluster affiliation 
for unseen data. 

The KSC method was applied for community detection in 
graphs by [24]. However, their subset and model selection 
approach was computationally expensive and memory inefficient. 
Recently, the KSC method was extended for big data networks in 
[25]. The method works by building a model on a representative 
subgraph of the large scale network. This subgraph is obtained by 
the fast and unique representative subset (FURS) selection 
technique as proposed in [26] . During the model selection stage, 
the model parameters are estimated along with determining the 
number of clusters k in the network. A self-tuned KSC model for 
big data networks was proposed in [27]. The major advantage of 
the KSC method is that it creates a model which has a powerful 
out-of-sample extensions properly. Using this properly, we can 
infer community affiliation for unseen nodes of the whole network. 

In [28], the authors used multiple scales of the kernel parameter 
a to determine the hierarchies in the data using KSC approach. 
However, in this approach the clustering model is trained for 
different values of (k,a) and evaluated for the entire dataset using 
the out-of-sample extension property. Then, a map is created to 
match the clusters at two levels of hierarchy. As stated by the 
authors in [28], during a merge there might be some data points of 
the merging clusters that go into a non-merging cluster which is 
then forced to join the merging cluster of the majority. In this 
paper, we overcome this problem and generate a natural 
hierarchical organization of the large scale network in an 
agglomerative fashion. 

The purpose of hierarchical community detection is to 
automatically locate multiple levels of granularity in the network 



with meaningful clusters at each level. The KSC method has been 
used effectively to obtain flat partitioning in real-world networks 
[24,25,27]. In this paper, we exploit the structure of the eigen- 
projections derived from the KSC model. The projections of the 
validation set nodes in the eigenspace is used to create an iterative 
set of affinity matrices resulting in a set of increasing distance 
thresholds (T). Since the validation set of nodes is a representative 
subset of the large scale network [26], we use these distance 
thresholds (r,eT) on the projections of the entire network obtained 
as a result of the out-of-sample extension property of the KSC 
model. These distance thresholds, when applied in an iterative 
manner, provide a multilevel hierarchical organization for the 
entire network in a bottom-up fashion. We show that our proposed 
approach is able to discover good quality coarse as well as refined 
clusters for real-life networks. 

There are some methods that optimize weighted graph cut 
objectives [29-31] to provide multilevel clustering for the large 
scale network. However, these methods suffer from the problem of 
determining the right value of k which is user defined. In real- 
world networks the value of k is not known beforehand. So in our 
experiments, we evaluate the proposed multilevel hierarchical 
kernel spectral clustering (MH-KSC) algorithm against the 
Louvain, Infomap and OSLOM methods. These methods 
automatically determine the number of clusters (k) at each level 
of hierarchy. Figure 1 provides an overview of steps involved in the 
MH-KSC algorithm and Figure 2 depicts the result of our 
proposed MH-KSC approach on email network (Enron). 

In all our experiments we consider unweighted and undirected 
networks. All the experiments were performed on a machine with 
12 Gb RAM, 2.4 GHz Intel Xeon processor. The maximum size 
of the kernel matrix that is allowed to be stored in the memory of 
our PC is 10,000x10,000. Thus, the maximum cardinality of our 
training and validation sets can be 10,000. We use 15% of the total 
nodes as size of training and validation set (if less than 10,000) 
based on experimental findings in [32]. We make use of the 
procedure provided in [25] to divide the data into chunks in order 
to extend our proposed approach to large scale networks. There 
are several steps in the proposed methodology which can be 
implemented on a distributed environment. We describe this in 
detail later. 

Kernel Spectral Clustering (KSC) Method 

We first summarize the notations used in the paper. 

Notations 

1 . A graph is mathematically represented as G = ( V,E) where 
V represents the set of nodes and E^VxV represents the 
set of edges in the network. Physically, the nodes represent 
the entities in the network and the edges represent the 
relationship between these entities. 
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2. The cardinality of the set Vis denoted as JV. 

3. The training, validation and test set of nodes is given by V tr , 
y valid an d V test respectively. 

4. The cardinality of the training, validation and test set is 
given Ntr, NyaUd, AW 

5. The adjacency list corresponding to each vertex v,-eF is 
given by x t = A{ : 



6. maxk is the maximum number of eigenvectors that we want 
to evaluate. 

7. K(-,-) represents the positive definite kernel function. 

8. The matrix S represents the affinity or similarity matrix. 

9. P represents the latent variable matrix containing the eigen- 
projections. 

10. h represents the h" 1 level of hierarchy and maxh stands for the 
coarsest level of hierarchy. 
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(a) Affinity matrix created at different levels of hierarchy in left to right order. The i-axis and the ;/-axis in 
each subgraph represents the size of the affinity matrix at each level of hierarchy. The number of block-diagonals 
in each subgraph represents k at that level of hierarchy. 




(b) Result of MH-KSC algorithm on Enron dataset. Circles which have the same colour are part of the same 
cluster at the top most level of hierarchy. We depict clusters at 2 different levels of hierarchy using the toolbox 
provided in [21]. 



Figure 2. Result of proposed MH-KSC approach on the Enron network. 

doi:1 0.1 371 /journal.pone.0099966.g002 
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Data: Graph G = (V. E) representing large scale network. 
Result: Multilevel Hierarchical Organization of the network. 

1 Divide data into train. validation and test set. V'ir.lra/id-l ic.it- 

2 Construct datasct I> = {JiJjJJ. s, € R' V from training set V' IP . 

3 Perform KSC on T> to obtain the predictive model as in A. 

4 Obtain /Valid = [ci, eA'„ 114 | T using predictive model and I'Valid- 

5 Construct S^JiJ) = CosDist(e„e } ) = 1 - -^fa. Vc„c ; € P lvi ul- 
C Begin validation stage with: h = 0. f<°> = 0.15. 

Algorithm 2. Figure 4 */ 

8 Add f (°> to the set T and C< 0) to the set C. 

9 while k > 1 do 
to h:= h+l. 

11 Create Sj^ti using S*£ff and C< h l ) as shown in 

12 Calculate using equation |j7jl. 

13 ICC"),*) = CrccdyA/axOrdrrfS^. tW). 

14 Add f< h > to the set T and CC 1 ' to the set C. 
is end 

/* Iterative procedure to get the set T. */ 
16 Obtain rVul like /\o/id and l>cgm with: h = 1, t<*) e T. 

IT |S^j t .C<'),Jt| = GrccdyFir»tOrdcr(PtcM.ll l) ). /» Alogrithm 3. Figure 5 */ 

18 Add C<*) to the set C. 

19 foreach f<*> € T, ft > 1 do 

20 ICW. Jfc) = Greedy MaxOrder(S^ f<*>). 

21 Add C"W to the set C. 

22 Create S^, i] using S^, and C< h > as shown in §). 

23 end 

24 Obtain the set C for test set and propagate cluster memberships itcrativcly from l rt to coarsest 
level of hierarchy. 

Figure 1. Algorithm 1:MH-KSC Algorithm 



Figure 3. Algorithm 1: MH-KSC Algorithm. 

doi:10.1371/journal.pone.0099966.g003 

1 1 . Set C comprises multilevel hierarchical clustering informa- 
tion. 

12. Coarsest level of hierarchy corresponds to fine grained 
clusters and finer levels of hierarchy correspond to coarse 
clusters. 



KSC methodology 

Given a graph G, we use the fast and unique representative 
subset (FURS) selection [26] technique to obtain training and 
validation set of nodes V tr and V m lid- FURS [26] is a deterministic 
subgraph selection technique where nodes with high degree 
centrality are greedily selected from most or all the communities in 
the network. Nodes with high degree centrality are usually located 



Data: Affinity matrix S and threshold t. 

Result: Clustering information C and numlxcr of clusters k. 

1 k = 1 and totinst = 0. 

2 while totinst j£ |5| do 

3 Find i in range (1. |5|) for which numl>er of instances j. s.t. S(t.j) < I, j = 1. . . . , \S\. is 
maximum. 

4 Put indices of instance i and all instances j. s.t. S(i.j) < t. to C*. 

5 k := k + 1 and totinst := totinst + |C*|. 

G Set all elements correspond ng to the indices in C* to dc in S. 

7 Add C k to the set C. 

8 k:=k- 1. 

Figure 2. Algorithm 2:Grccdy\l axOrdcr 



Figure 4. Algorithm 2: GreedyMaxOrder. 

doi:1 0.1 371 /journal.pone.0099966.g004 
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Data: Projection matrix Putt- threshold t' 

Result: Affinity matrix 5 ( '^ t . clustering information and k. 

1 1=1. 

2 while \Pu»t\ / 0 do 

3 Select 1" node and locate all nodes j for which CoaDist\c\.cj) < t^ l \ 

l Put all these instances in cj !) and to set 
k:=k+l. 

Remove these instances from P lc ,t to have a reduced /' (CilI . 

/« The affinity matrix (• v, ',,' tl ) is not calculated as it would be unfeasible to store 
an A' x .V matrix in remory. */ 

7 k := Jfc - 1. 

8 for » = 1 to IC^'l do 

9 for j = i+ 1 to |C (l) | do 

10 Calculate SQgfa j) as the average CoaDisl(-, -) lietween the cigen-projoctions of tlie 

instances m Cf« and Cj ! >. 

/* Affinity Matrix calculated for the first time. */ 

Figure 3. Algorithm 3:Grccdy FirstOrdcr 



Figure 5. Algorithm 3: GreedyFirstOrder. 
doi:1 0.1 371 /journal.pone.0099966.g005 

at the center, away from the periphery of the network and can 
better capture the inherent community structure. Since our goal is 
a locate multilevel hierarchical clustering in the large scale 
network, it is essential that the training and validation set are 
representative of the underlying community structure of the 
network. A detailed description of the FURS approach and its 
comparison with other state-of-the-art subset selection techniques 
is provided in [26]. 

We use 1 5 % of the total nodes as size of training and validation 
set (if less than 10,000 otherwise 10,000 nodes) based on 
experimental findings in [32]. Firstly, we apply FURS to obtain 
the training set of nodes V tr . Once these nodes are selected in the 
training set we remove these nodes from the network but maintain 
the topology (degree distribution) of the network. We then apply 
FURS again to obtain the validation set of nodes V m nj. Thus, 
both these sets V tr and V va nd are selected such that they retain the 
inherent community structure of the large scale network. We then 
use the entire large scale network as the test set V 1esl . 

For V tr training nodes the dataset is given by T> = {x,} i= ?j , 
JC/eR^. The adjacency list Xi can efficiendy be stored into memory 
as real-world networks are highly sparse and have limited 
connections for each node v,-. 

Given D and maxk, the primal formulation of the weighted 
kernel PC A [22] is given by: 

i maxk — 1 i maxk — 1 

»(').,(') ,a, 2 2N„. n a (1) 

such that eW = Ow® + b t \ Ntr ,1 = 1, . . . ,maxk - 1 , 

where = [e^, . . . >e^j r J are the projections onto the eigen- 

space, /= 1, . . . ,maxk-l indicates the number of score variables 
required to encode the maxk clusters. However, it was shown in 
[27] that we can discover more than maxk communities using these 



maxk-l score variables. 1 eU N ' r x N " is the inverse of the degree 
matrix associated to the kernel matrix £1 with 
Q.jj = K(xi,Xj) =(j)(xi) (j)(xj) . <I> is the N lr xd/, feature matrix 

such that <D= ^(xi) T ; . . . ; </>(xAJ r( .) T j and y/eR + is the regular- 

ization constant. We note that N tr «N i.e. the number of nodes in 
the training set is much less than the total number of nodes in the 
large scale network. 

The kernel matrix Q. is constructed by calculating the similarity 
between the adjacency list of each pair of nodes in the training set. 

Each element of fi, defined as = „ x '? ,, is calculated by 

\\ x j\\ 

estimating the cosine similarity between the adjacency lists Xj and 
Xj using notions of set intersection and union. This corresponds to 
using a normalized linear kernel function K(x,z) = p^jpjj [23]. 
The primal clustering model is then represented by: 

ef=w^i>{x i ) + b l ,i=\,...,N tr , (2) 

where ^ : IR A '-»IR''* is the feature map i.e. a mapping to high- 
dimensional feature space d/, and bt are the bias terms, 
1=1, .. . ,maxk- 1 . For large scale networks we can utilize the 
explicit expression of the underlying feature map as shown in [25] 
and set d/, = N. The dual problem corresponding to this primal 
formulation is given by: 

D a l M D aa {l) = lia { '\ (3) 
where Md is the centering matrix which is defined as 

Md=In„ — ( - t~ N -\ a ^ V The are the dual variables 
\ l N„ D n 'JVfr I 

and the kernel function K : U N x U N ~> U plays the role of 
similarity function. The dual predictive model is: 
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(a) Affinity matrices created at different levels of hierarchy for .\'r fi network. The j-axis and 
the t/-axis in each subgraph represents the size of the affinity matrix. The number of 
block-diagonals in each subgraph represents k at that level of hierarchy. 



• 1 



(b) Original hierarchical network (left) and estimated hierarchical network (right ) for synthetic network with 
10.IHKI nodes. The orientation and position of the communities might vary in the two plots. Both plots have 
clusters with "> micro communities. 1 clusters with I micro communities and 2 clusters with :i micro communities. 



Figure 6. Result of MH-KSC algorithm on benchmark Net-, network. 

doi:1 0.1 371 /journal.pone.0099966.g006 



e i '\x)=Y,« i l ) K(x,x i ) + b l , (4) 
1=1 

which provides clustering inference for the adjacency list x 
corresponding to the validation/ test node v. 

Multilevel Hierarchical KSC 

We use the predictive KSC model in the dual to get the latent 
variable matrix for the validation set V m M represented as 

Pvalid = [fit j • • ■ ^NmujV an d me test set Vtest (entire network) 
denoted by P tes t- In [27] the authors create an affinity matrix S m nd 
using the latent variable matrix P V alid 

which is a N m ud x (maxk-l) 

matrix, as: 

S m nd(i,j) = CosDist(ei,ej) = 1 -cos(e,,e/) = 1 - ' ,, , (5) 

\\ e '\\ \\ e j\\ 



where CosDist(- ,■) function calculates the cosine distance between 
2 vectors and takes values between [0,2]. Nodes which belong to 
the same community will have CosDist(ei,ej) closer to 0, yi,j in the 
same cluster. It was shown in [27] that a rotation of the S m ud 
matrix has a block diagonal structure. This block diagonal 
structure was used to identify the ideal number of clusters k in 
the network using the concept of entropy and balanced clusters. 

Determining the Distance Thresholds 

We propose an iterative bottom-up approach on the validation 
set to determine the set of distance thresholds T. In our approach, 
we refer to the affinity matrix at the ground level of hierarchy as 
^niiid- T ne ^miM matrrx is obtained by calculating the CosDist(-,-) 
between each element of the latent variable matrix P m nd as 
mentioned earlier. After several empirical evaluations, we observe 
that distance threshold at level 0 of hierarchy can be set to values 
between [0.1,0.2]. In our experiments we set r ' = 0.15. This 
allows to make the approach tractable to large scale networks 
which will be explained later. 
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(a) Affinity matrices created at different levels of hierarchy for .\'rt 2 network. The x-axis and 
the i/-axis in each subgraph represents the size of the affinity matrix. The number of 
block-diagonals in each subgraph represents k at that level of hierarchy. 
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(b) Original liierarchical network (left) and estimated hierarchical network (right) for synthetic network with 
51). INK) nodes. The orientation and position of the communities might vary in the two plots. Original network 
has A clusters with 11 micro communities. 2 clusters with 11. IS, 12 and 7 micro communities each. 1 cluster with 
1(1 and another 1 with ti micro communities. Estimated network has H clusters with 11 micro communities. 2 
clusters 13, 10 and :i micro communities each and 1 cluster with 11. 12. !( and 1 micro communities respectively. 

Figure 7. Result of MH-KSC algorithm on benchmark Net 2 network. 

doi:1 0.1 371 /journal.pone.0099966.g007 



We then use a greedy approach to select the validation node 
with maximum number of similar nodes in the latent space i.e we 
select the projection e, which has a maximum number of 
projections ej satisfying SfJi i(l (ij)< ?' 0 '. We put the indices of 
these nodes in Cj 0 ' representing the I st cluster at level 0 of 
hierarchy. We then remove these nodes and corresponding entries 
from S^J ljd to obtain a reduced matrix. This process is repeated 
iteratively until Sjjjjjy becomes empty. Thus, we obtain the set 
C <0) = {Cj°\ . . . ,C ( ' 0 '} where q is the total number of clusters at 

ground level of hierarchy. The set C' 0 ' has communities along 
with the indices of the nodes in these communities. 



To obtain the clusters at the next level of hierarchy we treat the 
communities at the previous levels as nodes. We then calculate the 
average cosine distance between these nodes using the information 
present in them. At each level h of hierarchy we create a new 
affinity matrix as: 



' icf-Vicf- 1 ^ 



'0V> 



(6) 



where |-| represents the cardinality of the set. In order to determine 
the threshold at level h of hierarchy, we estimate the minimum 
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Table 3. Nodes (V), Edges (E) and Clustering Coefficients (CCF) for each network. 



Network 


Nodes 


Edges 


CCF 


Facebook (Fb) 


4,039 


88,234 


0.6055 


PGPnet (PGP) 


1 0,876 


39,994 


0.008 


Cond-mat (Cond) 


23,133 


1 86,936 


0.6334 


Enron (Enr) 


36,692 


367,662 


0.497 


Epinions (Epn) 


75,879 


508,837 


0.1378 


Imdb-Actor (Imdb) 


383,640 


1,342,595 


0.453 


Youtube (Utube) 


1,134,890 


2,987,624 


0.081 
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cosine distance between each individual cluster and the other 
clusters (not considering itself). Then, we select the mean of these 
values as the new threshold for that level to combine clusters. This 
makes the approach different from the classical single-link 
clustering where we combine two clusters which are closest to 
each other at a given level of hierarchy and the average-link 
agglomerative clustering where we combine based on the average 
distance between all the clusters. 

The reason for using mean of these minimum cosine distance 
values as the new threshold is that if we consider the minimum of 
all the distance values then there is a risk of only combining 2 
clusters at that level. However, it is desirable to combine multiple 
sets of different clusters. Thus, the new threshold at level h is set 
as: 



M - 



■■ mean(miDj(S^ w (! 1 /))),i *j 



(7) 



We use this process iteratively till we reach the coarsest cluster 
where we have 1 cluster containing all the nodes. As a 
consequence we obtain the hierarchical clustering 
C = {C (0 \ . . . ,C( ,nax V} automatically. As we move from one level 
of hierarchy to another the value of distance threshold increases 
since we are merging large clusters at coarser levels of hierarchy. 
We finally end up with a set of increasing distance thresholds 

Requirements for Feasibility to Large Scale Networks 

The whole large scale network is used as test set. The latent 
variable matrix for the test set is obtained by out-of-sample 
extensions of the predictive KSC model and defined as 
Ptest = • • • i e w„,„] T - Since we use the entire network as test 
set, therefore, N tes t = N. The P te st matrix is a Nx(maxk-l) 
dimensional matrix. So, we can store this Ptest matrix in memory 
but cannot create an affinity matrix of size N x N due to memory 
constraints. 

To make the approach feasible to large scale network we put a 
condition that the maximum size of a cluster at ground level 
cannot exceed 10,000 (depending on the available computer 
memory) and the maximum number of clusters allowed at the 
ground level is 10,000. This limits the size of the affinity matrix at 
that level of hierarchy to be less than 1 0,000 x 1 0,000. It also effects 
the choice of the initial value of the distance threshold r'°\ If we set 
r 0 ' too high (»0.2) then majority of the nodes at the ground level 
in the test case will fall in one community resulting in one giant 
connected component. If we set the value of r <0) too low («0.1) 
then we will end up with lot of singleton clusters at the ground 



level in the test case. In our experiments, we observed that the 
interval any value between [0.1,0.2] is good choice for the initial 
threshold value at level 0 of hierarchy. To be consistent we chose 
/(°) =0.15 for all the networks. 

Multilevel Hierarchical KSC for Test Nodes 

The validation set is a representative subset of the whole 
network as shown in [26]. Thus, the threshold set T can be used to 
obtain a hierarchical clustering for the entire network. To make 
the proposed approach self-tuned, we use <®>r^>0.15, z>0, 
during the test phase. 

In order to prevent creating the affinity matrix for the large 
network we follow a greedy procedure. We select the projection of 
the first test node and calculate its similarity with the projections of 
all the test nodes. We then locate the indices (/) of those projections 
s.t. CosDist(e\,ej)<ty> . If the total number of such indices is less 
than 10,000 then we put them in cluster otherwise we select 
the first 10,000 indices and place them in cluster Cj 1 '. This is due 
to the constraint that the size of a cluster (Cj ) at ground level 
cannot exceed 10,000. We then remove entries corresponding to 
those projections in P lest to obtain a reduced matrix. We perform 
this procedure iteratively until P tes t is empty to obtain 
C' 1 ' = {Cj'\ . . . ,Cj''} where r is the total number of clusters at 
hierarchical level 1. After the 1 level, we use the same procedure 
that was for validation set i.e. creating an affinity matrix at each 
level using the cluster information along with the threshold set Tto 
obtain the hierarchical structure in an agglomerative fashion. The 
cluster memberships are propagated iteratively from the I s level to 
the highest level of hierarchy. The multilevel hierarchical kernel 
spectral clustering (MH-KSC) method is described in Figure 3 
which refers to Algorithm 2 and Algorithm 3 in Figure 4 and 
Figure 5 respectively. 

Time Complexity Analysis 

The two steps in our proposed approach which require the 
maximum computation time are the out-of-sample extensions for 
the test set and the creation of the affinity matrix from the ground 
level clusters. 

Since we use the entire network as test set the time required for 
out-of-sample extension is 0(N tr x N). Our greedy procedure to 
obtain the clustering information at the ground level Cf- ' requires 
0(r x N) computations where ris the number of clusters at 1 level 
of hierarchy for the test set. This is because for each cluster 
we remove all the indices belonging in that cluster from 
the matrix P test . As a result the size of P tes t decreases till it reduces 
to zero resulting in 0{r x N) computations. The affinity matrix 
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Figure 8. Tree based visualization of the multilevel hierarchical organization prevalent in 2 real-life networks. 
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S ( t ll, is a symmetric matrix so we only need to compute the upper However, as shown in [25], we can perform the out-of-sample 

or the lower triangular matrix. The number of cluster-cluster extensions in parallel on n computers and rows of the affinity 

Y X (r — 1 ) matrix can also be calculated in parallel thereby reducing the 

similarities that we have to calculate is where the size of \ 



each cluster at ground level can be maximum 10,000. 



complexity by -. 

n 
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(c) Few Micro and Few Macro Communities at Finer 
Levels 




m 

) Some Pre-domiuant Macro Communities at Finer 
vols 



Figure 9. MH-KSC algorithm for the PGP network. Communities with same colour belong to one cluster. 
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Experimental Results 

We conducted experiments on 2 synthetic datasets obtained 
from the toolkit in [4] and 7 real-world networks obtained from 
Stanford SNAP library (http://snap.stanford.edu/data/index. 
html). 

Synthetic Network Experiments 

The synthetic networks are referred as Net\ and Net2 and have 
2,000 and 50,000 nodes respectively. The ground truth for these 2 
benchmark networks are known at 2 levels of hierarchy. These 2 
levels of hierarchy for these benchmark networks are obtained by 
using 2 different mixing parameters i.e. /.ii and \i<x for macro and 
micro communities. We fixed ^=0.1 and /i 2 = 0-2 in our 
experiments. Since the ground truth is known beforehand, we 



evaluate the communities obtained by our proposed MH-KSC 
approach using an external quality metric like Adjusted Rand 
Index (ART) and Variation of Information (VI) [33]. We also 
evaluate the cluster information using internal cluster quality 
metrics like Modualrity (Q) [3] and Cut-Conductance (CC) [29]. 
We compare MH-KSC with Louvain, Infomap and OSLOM. 

Figures 6 and 7 showcase the result of MH-KSC algorithm on 
the Neti and Net% respectively. From Figures 6a and 7a, we 
observe the affinity matrices generated corresponding to the test 
set for Net\ and Net2 respectively. From Figures 6b and 7b, we 
can observe the communities prevalent in the original network and 
the communities estimated by MH-KSC method for Net\ and 
Net2 respectively. In Net\ there are 9 macro communities and 37 
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(c) Few Micro and Macro 
Communities at Finer Levels for 
Louvain Method 
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(e) Many Micro Communities at Coarser Levels for 
OSLOM Method 
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(f) Few Micro and Macro Communities at Fine Levels 
for OSLOM Method 



Figure 10. Results of Louvain, Infomap and OSLOM methods for PGP network. 

doi:10.1371/journal.pone.0099966.g010 



micro communities while in Netj there are 1 3 macro communities 
and 141 micro communities as depicted by Figures 6b and 7b. 

Table 1 illustrates the first 10 levels of hierarchy for Net\ and 
Net2 and evaluates the clusters obtained at each level of hierarchy 
w.r.t. quality metrics ARI, VI, Q, an d CC. Higher values of ARI 
(close to 1) and lower values of VI (close to 0) represent good 
quality clusters. Both these external quality metrics are normalized 
as shown in [33]. Higher values of modularity (Q, close to 1) and 
lower values of cut-conductance (CC close to 0) indicate better 
clustering information. 



Table 2 provides the result of Louvain, Infomap and OSLOM 
methods and compares it with the best levels of hierarchy for Net\ 
and Net%. The Louvain, Infomap and OSLOM methods require 
multiple runs as in each iteration they result in a different 
partition. We perform 10 runs and report the mean results in 
Table 2. From Table 2, it can be observed that the best results for 
Louvain and Infomap methods generally occur at finer levels of 
hierarchy w.r.t. to ARI, F/and (^metric. Thus, these two methods 
work well to identify macro communities. The Louvain method 
works the better than MH-KSC for Netz at macro and micro 
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Figure 11. Representing the 2 best levels of hierarchy for Epn network w.r.t. modularity criterion. 

doi:1 0.1 371 /journal.pone.0099966.g01 1 



level. However, it cannot obtain similar quality micro communities 
when compared with MH-KSC method for Net\ as inferred from 
Table 2. The Infomap method performs the worst among all the 
methods w.r.t. detection of communities at coarser levels of 
granularity. OSLOM performs well w.r.t. to locating both macro 
communities for Net\ and micro communities for Net2 as 
observed from Table 2. It performs better than any method 
w.r.t. locating micro communities for Net2 w.r.t. ARI and VI 
metric. However, it performs worst while trying to identify the 
macro communities for the same benchmark network. The MH- 
KSC performs best on Net\ while it performs better w.r.t. locating 
macro communities for Net 2. 

Real-Life Network Experiments 

We experimented on 7 real-life networks from the Stanford 
SNAP datasets. These networks are anonymous networks and are 
converted to undirected and unweighted networks before 
performing experiments on them. Table 3 provides information 



about topological characteristics of these real-life networks. The Fb 
and Epn networks are social networks, PGP is a trust based 
network, Cond is a collaboration network between researchers, 
Enr is an email network, Imdb is an actor-actor collaboration 
network and Utube is a web graph depicting friendship between 
the users of Youtube. 

In case of real-life networks the true hierarchical structure is not 
known beforehand. Hence, it is important to show whether they 
exhibit hierarchical organization which can be tested by identi- 
fying good quality clusters w.r.t. internal quality metrics like Q,and 
CC at multiple levels of hierarchy. 

We showcase the results for 1 0 levels of hierarchy in a bottom- 
up fashion for the MH-KSC method in Table 4. The finest level of 
hierarchy has all nodes in one community and is not very 
insightful. Clusters at finer levels of granularity comprises giant 
connected components. So, it is more meaningful to give more 
emphasis to fine grained clusters at coarser levels of hierarchy. To 
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(b) Best Result for Louvaiu method 
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(c) Best Result for Infomap approach 





(d) Best Result for OSLOM 



Figure 12. Representing the 2 best levels of hierarchy for Epn network w.r.t. cut-conductance criterion. 

doi:1 0.1 371 /journal.pone.0099966.g01 2 



show that real-life networks exhibit hierarchy we evaluate our 
proposed MH-KSC approach in Table 4. 

We compare MH-KSC algorithm with Louvain [15], Infomap 
[7] and OSLOM [21]. We perform 10 runs for each of these 
methods as they generate a separate partition each time when they 
are executed. The mean results of Louvain method is reported in 
Table 5. Table 6 showcases the results for Infomap and OSLOM 
method. 

From Table 5 it is evident that the Louvain method works best 
w.r.t. the modularity (Q) criterion. This aligns with methodology as 
it is trying to optimize for Q. However, the Louvain method always 
performs worse than MH-KSC algorithm w.r.t. cut-conductance 
CC as observed from Tables 4 and 5. Another issue with the 
Louvain method is that except for the Fb and PGP networks it is 
not able to detect (<1000 clusters) high quality clusters at finer 
levels of granularity. This is attributed to the resolution limit 
problem suffered by Louvain method. From Table 6 we observe 
that the Infomap method produces only 2 levels of hierarchy. In 
most of the cases, the clusters at one level of hierarchy perform 
good w.r.t. only 1 quality metric except the PGP and Cond 
networks. The difference between the quality of the clusters at the 



2 levels of hierarchy is quite drastic. This reflects that the Infomap 
method is not very consistent w.r.t. various quality metrics. 

We compare the performance of MH-KSC method with 
OSLOM in detail. From Tables 4 and 5 we observe that the 
MH-KSC technique outperforms OSLOM w.r.t. both quality 
metrics for Fb, Enr, Imdb and Utube networks while OSLOM 
does the same only for Cond network. In case of PGP, Cond and 
Epn networks OSLOM results in better Q_ than MH-KSC. 
However, MH-KSC approach has better CC value for PGP and 
Epn networks. For large scale networks like Enr, Imdb and Utube, 
OSLOM cannot identify good quality coarser clusters i.e. number 
of clusters detected are always >1000. 

Visualization and Illustrations 

We provide a tree based visualization of the multilevel 
hierarchical organization for Fb and Enr networks in Figure 8. 
The hierarchical structure is depicted as tree for Fb and Enr 
network in Figures 8a and 8b respectively. 

We plot the results corresponding to fine, intermediate and 
coarse levels of hierarchy for PGP network using the software 
provided in [21]. The software requires all the nodes in the 
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network along with 2 levels of hierarchy. In Figure 9 we plot the 
results for PGP net corresponding to MH-KSC algorithm using 2 
fine, 4 intermediate and 2 coarse levels of the hierarchical 
organization. For Louvain method we use 3 rd and 4 <A level of 
hierarchy as inputs for the fine clusters, 4 th and 5 th level of 
hierarchy as inputs for intermediate clusters and 5 th and 6 th level 
of hierarchy as inputs for plotting coarsest clusters. The Infomap 
method only generates 2 level of hierarchy which correspond to a 
plot for coarse clusters. Similarly, for OSLOM we plot coarse and 
fine clusters. The results for Louvain, Infomap and OSLOM 
methods are depicted in Figure 10. 

Figures 9 and 10 show that MH-KSC algorithm allows to depict 
richer structures than the other methods. It has more flexibility 
and allows the visualization at coarser, intermediate and finer 
levels of granularity. From Figures 10a, 10b, 10c and Table 5, we 
observe that the Louvain method can only detect quality clusters at 
coarser levels of granularity and cannot detect less than 1,00 
communities. While the Infomap method can only locate giant 
connected components for the PGP network as observed from 
Figure lOd and Table 6. The OSLOM method also seems to work 
reasonably well as observed from Figures 1 Oe and 1 Of. However, it 
detects fewer levels of hierarchy and thus has less flexibility in 
terms of selection for the level of hierarchy than the proposed MH- 
KSC approach. 

We provide a visualization of the 2 best layers of hierarchy for 
Epn network based on the Q,and the CC criterion for MH-KSC, 
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